Eecient Time-series Subsequence Matching Using Duality in Constructing Windows Eecient Time-series Subsequence Matching Using Duality in Constructing Windows
نویسندگان
چکیده
Subsequence matching in time-series databases is an important problem in data mining and has attracted a lot of research interest. It is a problem of nding the data sequences containing subsequences similar to a given query sequence and of nding the oosets of these subsequences in the original data sequences. In this paper, we propose a new approach (Dual Match) to subsequence matching that exploits duality in constructing windows and show this approach signiicantly improves performance. Dual Match divides data sequences into disjoint windows and the query sequence into sliding windows, and thus, is a dual approach of the one by Faloutsos et al. (FRM in short), which divides data sequences into sliding windows and the query sequence into disjoint windows. FRM causes a lot of false alarms (i.e., candidates that do not qualify) by storing minimum bounding rectangles rather than individual points representing windows to save storage space for the index. Dual Match solves this problem by directly storing points without incurring excessive storage overhead. Experimental results show that, in most cases, Dual Match provides large improvement both in false alarms and performance over FRM given the same amount of storage space. In particular, for low selectivities (less than 10 ?4), Dual Match drastically reduces the number of candidates|down to as little as 1 8800 of that for FRM|reduces the number of page accesses by up to 26.9 times, and improves performance up to 430-fold. On the other hand, for high selectivities (more than 10 ?2), it shows a very minor degradation (less than 29%) by all three measures. For selectivities in between (10 ?4 10 ?2), Dual Match shows performance slightly better than that of FRM. Dual Match also provides excellent performance in index creation. Experimental results show that it is 4.1025.6 times faster than FRM in building indexes of approximately same sizes. The main reason is that it requires far smaller number of transformations, which are a major part of the CPU overhead, than FRM does. Overall, these results indicate that our approach provides a new paradigm in subsequence matching that improves performance signiicantly in large database applications.
منابع مشابه
Efficient time-series subsequence matching using duality in constructing windows
In this paper, we propose a new subsequence matching method, Dual Match. Dual Match exploits duality in constructing windows and significantly improves performance. Dual Match divides data sequences into disjoint windows and the query sequence into sliding windows, and thus, is a dual approach of the one by Faloutsos et al. (Proceedings of the ACM SIGMOD International Conference on Management o...
متن کاملDuality-Based Subsequence Matching in Time-Series Databases
In this papec we propose a new subsequence matching method, DualMatch, which exploits duality in constructing windows and significantly improves performance. Qual Match divides data sequences into disjoint windows and the query sequence into sliding windows, and thus, is a dual approach of the one by Faloutsos et al. (FRM in short), which divides data sequences into sliding windows and the quer...
متن کاملLinear Detrending Subsequence Matching in Time-Series Databases
Each time-series has its own linear trend, the directionality of a timeseries, and removing the linear trend is crucial to get the more intuitive matching results. Supporting the linear detrending in subsequence matching is a challenging problem due to a huge number of possible subsequences. In this paper we define this problem the linear detrending subsequence matching and propose its efficien...
متن کاملOn Differentially Private Longest Increasing Subsequence Computation in Data Stream
Many important applications require a continuous computation of statistics over data streams. Activities monitoring, surveillance and fraud detections are some settings where it is crucial for the monitoring applications to protect user’s sensitive information in addition to efficiently compute the required statistics. In the last two decades, a broad range of techniques for time-series and str...
متن کاملA Single Index Approach for Distortion-Free Time-Series Subsequence Matching
In this paper we propose a new method for distortionfree time-series subsequence matching. Our method is distortion-free in the sense that it performs preprocessing on time-series to remove the distortions of offset translation and amplitude scaling at the same time. We call this preprocessing as normalization transform in this paper. Previous work on the normalization-transformed subsequence m...
متن کامل